Introduction

Some review of the subject and the list of hypotheses discussed at meetings.

Table 1

diamonds %>%
  tableone::CreateTableOne(
    data = .,
    includeNA = T,
    # strata = "visit",
    addOverall = T
  ) %>%
  tableone::kableone()
Overall
n 53940
carat (mean (SD)) 0.80 (0.47)
cut (%)
Fair 1610 ( 3.0)
Good 4906 ( 9.1)
Very Good 12082 (22.4)
Premium 13791 (25.6)
Ideal 21551 (40.0)
color (%)
D 6775 (12.6)
E 9797 (18.2)
F 9542 (17.7)
G 11292 (20.9)
H 8304 (15.4)
I 5422 (10.1)
J 2808 ( 5.2)
clarity (%)
I1 741 ( 1.4)
SI2 9194 (17.0)
SI1 13065 (24.2)
VS2 12258 (22.7)
VS1 8171 (15.1)
VVS2 5066 ( 9.4)
VVS1 3655 ( 6.8)
IF 1790 ( 3.3)
depth (mean (SD)) 61.75 (1.43)
table (mean (SD)) 57.46 (2.23)
price (mean (SD)) 3932.80 (3989.44)
x (mean (SD)) 5.73 (1.12)
y (mean (SD)) 5.73 (1.14)
z (mean (SD)) 3.54 (0.71)

Basic descriptive characteristics

Distribution of age by sex

diamonds %>%
  ggplot(aes(price, fill = color)) +
  geom_density(alpha = .3) +
  labs(
    title = "Age Male x Female", x = "Age",
    y = "Density"
  ) +
  theme_linedraw()

Missing and categories and distribution in one picture

tabplot::tableplot(diamonds)
Registered S3 methods overwritten by 'ffbase':
  method   from
  [.ff     ff  
  [.ffdf   ff  
  [<-.ff   ff  
  [<-.ffdf ff  
Missings, categories and distributions

Missings, categories and distributions

Any correlated?

diamonds %>%
  select_if(is_numeric) %>%
  psych::pairs.panels(.,
    method = "pearson", # correlation method
    hist.col = "#00AFBB",
    density = TRUE, # show density plots
    ellipses = TRUE # show correlation ellipses
  )

More info on distribution with boxplots

diamonds %>%
  select_if(is.numeric) %>%
  gather(key = "ind", value = "values") %>%
  ggplot(aes(x = ind, y = values)) +
  geom_boxplot() +
  coord_flip() +
  theme_minimal() +
  scale_fill_grey()

System information

project.info
$config
$config$version
[1] "0.10.2"

$config$data_loading
[1] TRUE

$config$data_loading_header
[1] TRUE

$config$data_ignore
[1] ""

$config$cache_loading
[1] TRUE

$config$recursive_loading
[1] FALSE

$config$munging
[1] TRUE

$config$logging
[1] FALSE

$config$logging_level
[1] "INFO"

$config$load_libraries
[1] TRUE

$config$libraries
[1] "dtplyr"

$config$as_factors
[1] FALSE

$config$tables_type
[1] "data.table"

$config$attach_internal_libraries
[1] FALSE

$config$cache_loaded_data
[1] TRUE

$config$sticky_variables
[1] "NONE"

$config$underscore_variables
[1] TRUE

$config$cache_file_format
[1] "RData"


$packages
[1] "dtplyr"

$helpers
[1] "pclean.R"
sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Monterey 12.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] dtplyr_1.2.0    forcats_0.5.1   stringr_1.4.0   dplyr_1.0.7    
 [5] purrr_0.3.4     readr_2.1.1     tidyr_1.1.4     tibble_3.1.6   
 [9] ggplot2_3.3.5   tidyverse_1.3.1

loaded via a namespace (and not attached):
 [1] httr_1.4.2             jsonlite_1.7.2         viridisLite_0.4.0     
 [4] splines_4.1.2          tmvnsim_1.0-2          ffbase_0.13.3         
 [7] here_1.0.1             modelr_0.1.8           assertthat_0.2.1      
[10] highr_0.9              cellranger_1.1.0       yaml_2.2.1            
[13] pillar_1.6.4           backports_1.4.1        lattice_0.20-45       
[16] glue_1.6.0             digest_0.6.29          rvest_1.0.2           
[19] colorspace_2.0-2       psych_2.1.9            htmltools_0.5.2       
[22] Matrix_1.3-4           survey_4.1-1           pkgconfig_2.0.3       
[25] broom_0.7.11           labelled_2.9.0         haven_2.4.3           
[28] scales_1.1.1           tabplot_1.4.1          ff_4.0.5              
[31] tzdb_0.2.0             proxy_0.4-26           generics_0.1.1        
[34] farver_2.1.0           ellipsis_0.3.2         withr_2.4.3           
[37] mnormt_2.0.2           cli_3.1.0              survival_3.2-13       
[40] magrittr_2.0.1         crayon_1.4.2           readxl_1.3.1          
[43] evaluate_0.14          fs_1.5.2               fansi_0.5.0           
[46] nlme_3.1-153           xml2_1.3.3             class_7.3-19          
[49] tableone_0.13.0        tools_4.1.2            data.table_1.14.2     
[52] hms_1.1.1              mitools_2.4            lifecycle_1.0.1       
[55] ProjectTemplate_0.10.2 munsell_0.5.0          reprex_2.0.1          
[58] compiler_4.1.2         jquerylib_0.1.4        e1071_1.7-9           
[61] rlang_0.4.12           grid_4.1.2             rstudioapi_0.13       
[64] labeling_0.4.2         rmarkdown_2.11         gtable_0.3.0          
[67] DBI_1.1.2              R6_2.5.1               zoo_1.8-9             
[70] lubridate_1.8.0        knitr_1.37             bit_4.0.4             
[73] fastmap_1.1.0          utf8_1.2.2             fastmatch_1.1-3       
[76] rprojroot_2.0.2        stringi_1.7.6          parallel_4.1.2        
[79] Rcpp_1.0.7             vctrs_0.3.8            dbplyr_2.1.1          
[82] tidyselect_1.1.1       xfun_0.29             

References